Open source optical character recognition for historical research
نویسندگان
چکیده
منابع مشابه
Integrating Optical Character Recognition and Machine Translation of Historical Documents
Machine Translation (MT) plays a critical role in expanding capacity in the translation industry. However, many valuable documents, including digital documents, are encoded in non-accessible formats for machine processing (e.g., Historical or Legal documents). Such documents must be passed through a process of Optical Character Recognition (OCR) to render the text suitable for MT. No matter how...
متن کاملOptical Character Recognition by Open source OCR Tool Tesseract: A Case Study
Optical character recognition (OCR) method has been used in converting printed text into editable text. OCR is very useful and popular method in various applications. Accuracy of OCR can be dependent on text preprocessing and segmentation algorithms. Sometimes it is difficult to retrieve text from the image because of different size, style, orientation, complex background of image etc. We begin...
متن کاملOptical Character Recognition
This paper describes two implementations in optical character recognition using template matching method and feature extraction method followed by support vector machine classification. With proper image preprocessing, the texts are segmented into isolated characters and the correlations between a single character and a given set of templates are computed to find the similarities and then ident...
متن کاملOptical Character Recognition Systems
Abstract Optical character recognition (OCR) is process of classification of optical patterns contained in a digital image. The character recognition is achieved through segmentation, feature extraction and classification. This chapter presents the basic ideas of OCR needed for a better understanding of the book. The chapter starts with a brief background and history of OCR systems. Then the di...
متن کاملOptical Character Recognition
In this paper we present for the first time, the development of a new system for the off-line optical recognition of the characters used in the Orthodox Hellenic Byzantine Music Notation, that has been established since 1814. We describe the structure of the new system and propose algorithms for the recognition of the 71 distinct character classes, based on Wavelets, 4-projections and other str...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Documentation
سال: 2012
ISSN: 0022-0418
DOI: 10.1108/00220411211256021